Equity in NYC Park Access: A Spatial Analysis of Walking Distance

STA 9750 Final Individual Report

Author

Mirae Han

Published

December 14, 2025

1 Introduction

Parks are more than green spaces—they are vital infrastructure that shapes public health, community well-being, and environmental quality. Yet access to these essential resources remains deeply unequal across New York City’s neighborhoods. Our team investigates a fundamental question of urban equity: How equitable is access to public park space across different neighborhoods in New York City? While the city boasts over 1,700 parks spanning 30,000 acres, true equity requires examining three critical dimensions: the quantity of park acreage per capita, the quality of park amenities and maintenance, and the proximity of parks to residents.

This report focuses specifically on the proximity dimension, addressing my research question: How does the average walking distance to the nearest park vary across the city and its diverse neighborhood demographics? By measuring walking distances from residential areas to their nearest parks and integrating demographic data on income and race/ethnicity, this analysis tests whether park access reinforces or challenges existing patterns of urban inequality. Drawing from environmental justice literature, I hypothesize that lower-income and predominantly minority neighborhoods systematically face greater distances to parks, revealing how the geography of opportunity operates even in the distribution of public green space.


2 Data Acquisition and Processing

Show code
# Clear environment
rm(list = ls())
invisible(gc())

# Required packages
required_packages <- c(
  "tidyverse",   # Data manipulation and visualization
  "sf",          # Spatial data handling
  "viridis",     # Color palettes
  "scales",      # Formatting
  "jsonlite",    # JSON parsing
  "httr",        # HTTP requests
  "leaflet",     # Interactive maps
  "plotly",      # Interactive plots
  "patchwork"    # Plot composition
)

# Function to install and load packages
install_and_load <- function(package) {
  if (!require(package, character.only = TRUE, quietly = TRUE)) {
    install.packages(package, dependencies = TRUE, repos = "https://cloud.r-project.org")
    library(package, character.only = TRUE)
  } else {
    library(package, character.only = TRUE)
  }
}

# Load all packages
invisible(sapply(required_packages, install_and_load))

# Setup data directory
data_dir <- "data/Group_Project"
if (!dir.exists(data_dir)) {
  dir.create(data_dir, recursive = TRUE)
}

2.1 NYC Parks Properties Data

The NYC Parks Properties dataset from NYC Open Data contains over 1,800 park records with geographic boundaries. Parks were converted to spatial features and transformed to EPSG:2263 (NYC State Plane) for accurate distance calculations.

Show code
parks_file <- file.path(data_dir, "parks_properties.rds")

if (file.exists(parks_file)) {
  parks_properties <- readRDS(parks_file)
} else {
  parks_url <- "https://data.cityofnewyork.us/api/views/enfh-gkve/rows.csv?accessType=DOWNLOAD"
  
  parks_properties <- read.csv(parks_url, stringsAsFactors = FALSE)
  saveRDS(parks_properties, parks_file)
}

Total parks loaded: 2,055

Show code
# Borough lookup table
borough_lookup <- c(
  "M" = "Manhattan",
  "B" = "Brooklyn",
  "Q" = "Queens",
  "X" = "Bronx",
  "R" = "Staten Island"
)

# Process parks spatial data
parks_sf <- parks_properties %>%
  filter(
    !is.na(multipolygon),
    multipolygon != "",
    !is.na(BOROUGH),
    BOROUGH != "",
    BOROUGH %in% names(borough_lookup)
  ) %>%
  st_as_sf(wkt = "multipolygon", crs = 4326) %>%
  st_make_valid() %>%
  mutate(
    BOROUGH_FULL = recode(BOROUGH, !!!borough_lookup),
    PROPNAME = SIGNNAME
  ) %>%
  select(PROPNAME, BOROUGH = BOROUGH_FULL) %>%
  st_transform(crs = 2263)  # NYC State Plane (feet)

Processed parks: 2,055

2.2 U.S. Census Bureau Demographics Data

Demographic data from the Census Bureau’s ACS 2022 API includes population, income, and racial/ethnic composition for NYC ZIP codes. Data was filtered to 177 NYC ZIP codes with complete demographic profiles.

Variables Collected

  • B01003_001E: Total Population
  • B19013_001E: Median Household Income
  • B03002_003E: White alone (Not Hispanic)
  • B03002_004E: Black or African American alone (Not Hispanic)
  • B03002_006E: Asian alone (Not Hispanic)
  • B03002_012E: Hispanic or Latino
Show code
demo_file <- file.path(data_dir, "nyc_census_demographics.rds")

if (file.exists(demo_file)) {
  cat("Loading Census data from cache...\n")
  nyc_demographics_raw <- readRDS(demo_file)
  cat(sprintf("Loaded %d NYC ZIP codes\n", nrow(nyc_demographics_raw)))
} else {
  cat("Downloading Census data from API...\n")
  
  census_api_url <- paste0(
    "https://api.census.gov/data/2022/acs/acs5?",
    "get=NAME,B01003_001E,B19013_001E,B03002_003E,B03002_004E,B03002_006E,B03002_012E",
    "&for=zip%20code%20tabulation%20area:*",
    "&in=state:36"
  )
  
  tryCatch({
    response <- GET(census_api_url, timeout(180))
    
    if (status_code(response) != 200) {
      stop(sprintf("Census API returned HTTP status code: %d", status_code(response)))
    }
    
    json_content <- content(response, as = "text", encoding = "UTF-8")
    
    # Check for error messages in response
    if (grepl("error|Error|ERROR", json_content)) {
      cat("Census API returned an error. Response:\n")
      cat(substr(json_content, 1, 500), "\n")
      stop("Census API error - please run the R script first to cache the data")
    }
    
    census_raw <- fromJSON(json_content)
    
    census_df <- as.data.frame(census_raw[-1, ], stringsAsFactors = FALSE)
    colnames(census_df) <- census_raw[1, ]
    
    census_df <- census_df %>%
      mutate(
        zip = `zip code tabulation area`,
        total_pop = as.numeric(B01003_001E),
        median_income = as.numeric(B19013_001E),
        white = as.numeric(B03002_003E),
        black = as.numeric(B03002_004E),
        asian = as.numeric(B03002_006E),
        hispanic = as.numeric(B03002_012E)
      )
    
    # Filter for NYC ZIP codes only
    nyc_demographics_raw <- census_df %>%
      mutate(zip_num = as.numeric(zip)) %>%
      filter(
        !is.na(zip_num),
        (zip_num >= 10001 & zip_num <= 10282) |
          (zip_num >= 10301 & zip_num <= 10314) |
          (zip_num >= 10451 & zip_num <= 10475) |
          (zip_num >= 11004 & zip_num <= 11109) |
          (zip_num >= 11201 & zip_num <= 11256) |
          (zip_num >= 11351 & zip_num <= 11697)
      ) %>%
      filter(
        !is.na(total_pop), total_pop > 0,
        !is.na(median_income), median_income > 0
      ) %>%
      mutate(
        pct_white = (white / total_pop) * 100,
        pct_black = (black / total_pop) * 100,
        pct_asian = (asian / total_pop) * 100,
        pct_hispanic = (hispanic / total_pop) * 100,
        borough = case_when(
          zip_num >= 10001 & zip_num <= 10282 ~ "Manhattan",
          zip_num >= 10301 & zip_num <= 10314 ~ "Staten Island",
          zip_num >= 10451 & zip_num <= 10475 ~ "Bronx",
          zip_num >= 11201 & zip_num <= 11256 ~ "Brooklyn",
          zip_num >= 11004 & zip_num <= 11109 ~ "Queens",
          zip_num >= 11351 & zip_num <= 11697 ~ "Queens",
          TRUE ~ NA_character_
        )
      ) %>%
      filter(!is.na(borough)) %>%
      select(zip, borough, total_pop, median_income,
             pct_white, pct_black, pct_asian, pct_hispanic)
    
    saveRDS(nyc_demographics_raw, demo_file)
    cat(sprintf("Downloaded and cached %d NYC ZIP codes\n", nrow(nyc_demographics_raw)))
    
  }, error = function(e) {
    cat("\n=======================================================================\n")
    cat("ERROR: Census API request failed\n")
    cat("=======================================================================\n")
    cat("Error message:", e$message, "\n\n")
    cat("SOLUTION: Please run the R script first to cache the data:\n")
    cat("  source('final_individual_report.R')\n\n")
    cat("This will download and cache all data, then the QMD will work.\n")
    cat("=======================================================================\n\n")
    stop("Census data not available. Run R script first to cache data.")
  })
}
Loading Census data from cache...
Loaded 176 NYC ZIP codes

Total NYC ZIP codes: 176

2.3 NYC ZIP Code Boundaries

NYC MODZCTA dataset provides polygon geometries for 177 residential ZIP code areas, transformed to NYC State Plane (EPSG:2263) for distance calculations.

Show code
zip_boundaries_file <- file.path(data_dir, "zip_boundaries.rds")

if (file.exists(zip_boundaries_file)) {
  zip_boundaries <- readRDS(zip_boundaries_file)
} else {
  zip_url <- "https://data.cityofnewyork.us/resource/pri4-ifjk.geojson"
  zip_boundaries <- st_read(zip_url, quiet = TRUE)
  zip_boundaries <- st_transform(zip_boundaries, crs = 2263)
  saveRDS(zip_boundaries, zip_boundaries_file)
}

Total ZIP boundaries: 178

2.4 Merging and Final Data Preparation

Census demographics were joined to spatial boundaries, with income categories ($40k, $75k, $125k thresholds) and majority demographic groups (>50%) created for analysis.

Show code
zip_col_name <- intersect(
  names(zip_boundaries), 
  c("modzcta", "MODZCTA", "ZIPCODE", "zipcode", "zip", "ZIP", "zcta", "ZCTA")
)[1]

zip_boundaries_clean <- zip_boundaries %>%
  mutate(zip_from_boundary = as.character(.data[[zip_col_name]]))

nyc_demographics_sf <- zip_boundaries_clean %>%
  left_join(
    nyc_demographics_raw %>%
      mutate(zip = as.character(zip)) %>%
      select(zip, borough, total_pop, median_income,
             pct_white, pct_black, pct_hispanic, pct_asian),
    by = c("zip_from_boundary" = "zip"),
    relationship = "many-to-one"
  ) %>%
  filter(
    !is.na(total_pop), total_pop > 0,
    !is.na(median_income), median_income > 0
  )

# Create categorical variables
nyc_demographics_sf <- nyc_demographics_sf %>%
  mutate(
    zip = zip_from_boundary,
    income_category = case_when(
      median_income < 40000 ~ "Low Income (<$40k)",
      median_income < 75000 ~ "Lower-Middle ($40k-$75k)",
      median_income < 125000 ~ "Upper-Middle ($75k-$125k)",
      median_income >= 125000 ~ "High Income (>$125k)",
      TRUE ~ "Unknown"
    ),
    income_category = factor(income_category, levels = c(
      "Low Income (<$40k)",
      "Lower-Middle ($40k-$75k)",
      "Upper-Middle ($75k-$125k)",
      "High Income (>$125k)"
    )),
    majority_group = case_when(
      pct_hispanic > 50 ~ "Majority Hispanic",
      pct_white > 50 ~ "Majority White",
      pct_black > 50 ~ "Majority Black",
      pct_asian > 50 ~ "Majority Asian",
      TRUE ~ "No Majority"
    )
  )

Successfully merged: 172 ZIP areas with complete data


3 Distance Calculation Methodology

This analysis employs geospatial techniques to calculate walking distances from each ZIP code centroid to its nearest park. Using the sf package in R, I compute Euclidean distances in the New York State Plane coordinate system (EPSG:2263), which provides accurate measurements in feet that are then converted to miles. The nearest neighbor algorithm efficiently identifies the closest park for each of NYC’s 177 ZIP codes, establishing a baseline accessibility metric.

Methodological Justification: ZIP code centroids serve as proxy locations for residential populations within each area. While this approach aggregates diverse intra-ZIP patterns, it provides a standardized, reproducible metric suitable for city-wide comparative analysis. The centroid-based approach has been widely validated in urban accessibility research and offers computational efficiency for large-scale spatial analysis. Alternative approaches using population-weighted centroids or street network distances would provide greater precision but require substantially more complex data processing. For this exploratory analysis examining systematic patterns across income and racial groups, the centroid method offers an appropriate balance between accuracy and feasibility.

The 10-minute walk standard (approximately 0.5 miles) serves as the accessibility benchmark, following the Trust for Public Land’s ParkScore methodology. This threshold represents a reasonable walking distance for diverse populations, including children and elderly residents, and has been validated by urban planning research as a critical determinant of park usage. ZIP codes meeting this standard are classified as having adequate park access, while those exceeding it face accessibility barriers that may limit residents’ ability to benefit from green space.

Show code
zip_centroids <- st_centroid(nyc_demographics_sf)
  • Computed centroids for 172 ZIP code areas
  • Centroids represent the geographic center of each ZIP area

Step 2: Find Nearest Park

Show code
nearest_park_idx <- st_nearest_feature(zip_centroids, parks_sf)
  • Used spatial indexing for efficient nearest-neighbor search
  • Each ZIP code matched to its geographically closest park

Step 3: Calculate Distances

Show code
distances <- st_distance(zip_centroids, parks_sf[nearest_park_idx, ], by_element = TRUE)
distances_mi <- as.numeric(distances) / 5280  # Convert feet to miles
  • Distances calculated in feet using NYC State Plane projection
  • Converted to miles for interpretability (1 mile = 5,280 feet)
  • Distance range: 0.000 - 0.713 miles

Methodological Limitation

NoteImportant Note

Euclidean distance is a proxy for actual walking distance. True walking paths would need to account for street networks, pedestrian infrastructure, and barriers (highways, rivers). Research suggests straight-line distance underestimates actual walking distance by approximately 20-30%.

Benchmark: 10-Minute Walk Standard

The Trust for Public Land defines park accessibility as being within a 10-minute walk (approximately 0.5 miles). We use this as our equity benchmark.

Show code
distance_results <- nyc_demographics_sf %>%
  st_drop_geometry() %>%
  mutate(
    distance_to_park_ft = as.numeric(distances),
    distance_to_park_mi = distances_mi,
    nearest_park_name = parks_sf$PROPNAME[nearest_park_idx],
    nearest_park_borough = parks_sf$BOROUGH[nearest_park_idx],
    within_10min_walk = distance_to_park_mi <= 0.5
  )

pct_accessible <- mean(distance_results$within_10min_walk, na.rm = TRUE) * 100

Preliminary Finding: 99.4% of NYC ZIP codes meet the 10-minute walk standard.


4 Results and Analysis

This section presents three complementary analyses examining park access disparities through spatial visualization, statistical correlation, and demographic comparisons. Together, these analyses provide comprehensive evidence of systematic inequities in how New York City’s park resources are distributed across neighborhoods.

Show code
# Professional theme for all plots
theme_professional <- function() {
  theme_minimal(base_size = 13, base_family = "sans") +
    theme(
      plot.title = element_text(face = "bold", size = 16, hjust = 0, margin = margin(b = 8)),
      plot.subtitle = element_text(size = 11, color = "gray40", hjust = 0, margin = margin(b = 15)),
      plot.caption = element_text(size = 9, color = "gray50", hjust = 1, margin = margin(t = 15)),
      panel.grid.major = element_line(color = "gray90", linewidth = 0.3),
      panel.grid.minor = element_blank(),
      panel.background = element_rect(fill = "white", color = NA),
      plot.background = element_rect(fill = "white", color = NA),
      axis.title = element_text(size = 11, face = "bold", color = "gray30"),
      axis.text = element_text(size = 10, color = "gray40"),
      axis.line = element_line(color = "gray70", linewidth = 0.3),
      axis.ticks = element_line(color = "gray70", linewidth = 0.3),
      legend.position = "right",
      legend.title = element_text(size = 10, face = "bold"),
      legend.text = element_text(size = 9),
      legend.background = element_rect(fill = "white", color = "gray80", linewidth = 0.3),
      plot.margin = margin(20, 20, 20, 20)
    )
}

theme_set(theme_professional())

# Custom color palettes
income_colors <- c(
  "Low Income (<$40k)" = "#d73027",
  "Lower-Middle ($40k-$75k)" = "#fc8d59",
  "Upper-Middle ($75k-$125k)" = "#91bfdb",
  "High Income (>$125k)" = "#4575b4"
)

demographic_colors <- c(
  "Majority Hispanic" = "#e78ac3",
  "Majority White" = "#8da0cb",
  "Majority Black" = "#fc8d62",
  "Majority Asian" = "#66c2a5",
  "No Majority" = "#a6d854"
)

4.1 Visualization 1: Interactive Spatial Distribution Map

Show code
# Prepare map data
map_data <- nyc_demographics_sf %>%
  mutate(
    distance_to_park_mi = distance_results$distance_to_park_mi,
    distance_to_park_ft = distance_results$distance_to_park_ft,
    nearest_park_name = distance_results$nearest_park_name,
    nearest_park_borough = distance_results$nearest_park_borough,
    within_10min_walk = distance_results$within_10min_walk
  )

# Transform to WGS84 for Leaflet
map_data_wgs84 <- st_transform(map_data, crs = 4326)

# Create color palette
pal_distance <- colorNumeric(
  palette = c("#065f46", "#10b981", "#fbbf24", "#f59e0b", "#dc2626"),
  domain = map_data_wgs84$distance_to_park_mi,
  na.color = "#808080"
)

# Create popup content
popup_content <- paste0(
  "<div style='font-family: Arial; font-size: 12px; padding: 5px;'>",
  "<b style='font-size: 14px;'>ZIP Code: ", map_data_wgs84$zip, "</b><br><br>",
  "<b>Park Access Metrics:</b><br>",
  "Distance to Park: <b>", round(map_data_wgs84$distance_to_park_mi, 3), " miles</b><br>",
  "Nearest Park: ", map_data_wgs84$nearest_park_name, "<br>",
  "10-Min Walk: <b>", ifelse(map_data_wgs84$within_10min_walk, 
                              "<span style='color: green;'>✓ Yes</span>", 
                              "<span style='color: red;'>✗ No</span>"), "</b><br><br>",
  "<b>Demographics:</b><br>",
  "Borough: ", map_data_wgs84$borough, "<br>",
  "Population: ", format(round(map_data_wgs84$total_pop), big.mark = ","), "<br>",
  "Median Income: $", format(round(map_data_wgs84$median_income), big.mark = ","), "<br>",
  "Income Category: ", map_data_wgs84$income_category, "<br><br>",
  "<b>Racial/Ethnic Composition:</b><br>",
  "White: ", round(map_data_wgs84$pct_white, 1), "%<br>",
  "Black: ", round(map_data_wgs84$pct_black, 1), "%<br>",
  "Hispanic: ", round(map_data_wgs84$pct_hispanic, 1), "%<br>",
  "Asian: ", round(map_data_wgs84$pct_asian, 1), "%<br>",
  "</div>"
)

# Create interactive map
leaflet(map_data_wgs84, width = "100%", height = "600px") %>%
  addProviderTiles(providers$CartoDB.Positron) %>%
  addPolygons(
    fillColor = ~pal_distance(distance_to_park_mi),
    fillOpacity = 0.7,
    color = "white",
    weight = 1,
    popup = popup_content,
    highlight = highlightOptions(
      weight = 3,
      color = "#1e3a8a",
      fillOpacity = 0.9,
      bringToFront = TRUE
    ),
    label = ~paste0("ZIP ", zip, ": ", round(distance_to_park_mi, 3), " miles"),
    labelOptions = labelOptions(
      style = list("font-weight" = "normal", padding = "3px 8px"),
      textsize = "12px",
      direction = "auto"
    )
  ) %>%
  addLegend(
    position = "bottomright",
    pal = pal_distance,
    values = ~distance_to_park_mi,
    title = "Distance to<br>Nearest Park<br>(miles)",
    opacity = 0.8,
    labFormat = labelFormat(digits = 2)
  ) %>%
  setView(lng = -73.935, lat = 40.730, zoom = 10)
Figure 1: Interactive map showing walking distance to nearest park by NYC ZIP code. Click on areas for detailed demographic information.

The interactive map reveals stark spatial patterns in park accessibility. Darker colors (red/orange) indicating poor access concentrate in outer boroughs—particularly southern Brooklyn, eastern Queens, and Staten Island—while Manhattan exhibits predominantly green tones suggesting superior proximity. This geographic distribution correlates strongly with socioeconomic indicators explored in subsequent analyses. The clustering of poor access areas in peripheral neighborhoods suggests systematic underinvestment in park infrastructure where land is more affordable but residents face greater barriers to green space. Conversely, the concentration of excellent access in Manhattan—where property values are highest—demonstrates how park proximity functions as an amenity that reinforces existing patterns of urban advantage.

Show code
borough_stats <- distance_results %>%
  group_by(borough) %>%
  summarise(
    `ZIP Codes` = n(),
    `Avg Distance (mi)` = round(mean(distance_to_park_mi, na.rm = TRUE), 3),
    `Median Distance (mi)` = round(median(distance_to_park_mi, na.rm = TRUE), 3),
    `% Within 10-Min Walk` = round(mean(within_10min_walk, na.rm = TRUE) * 100, 1),
    .groups = "drop"
  ) %>%
  arrange(desc(`Avg Distance (mi)`))

knitr::kable(borough_stats)
Table 1: Park access statistics by borough
borough ZIP Codes Avg Distance (mi) Median Distance (mi) % Within 10-Min Walk
Queens 54 0.182 0.155 98.1
Staten Island 12 0.173 0.178 100.0
Brooklyn 37 0.146 0.121 100.0
Bronx 25 0.115 0.094 100.0
Manhattan 44 0.104 0.065 100.0

4.2 Visualization 2: Income-Distance Relationship Analysis

Show code
cor_data <- distance_results %>%
  filter(complete.cases(distance_to_park_mi, median_income))

cor_test <- cor.test(cor_data$distance_to_park_mi, cor_data$median_income)

# Linear regression model
lm_model <- lm(distance_to_park_mi ~ median_income, data = cor_data)
lm_summary <- summary(lm_model)
Show code
ggplot(cor_data, aes(x = median_income, y = distance_to_park_mi)) +
  geom_point(
    aes(color = income_category, size = total_pop), 
    alpha = 0.6
  ) +
  geom_smooth(
    method = "lm", 
    se = TRUE, 
    color = "#dc2626", 
    fill = "#fecaca",
    linetype = "solid", 
    linewidth = 1.2
  ) +
  scale_x_continuous(
    labels = dollar_format(),
    breaks = seq(0, max(cor_data$median_income, na.rm = TRUE), 25000)
  ) +
  scale_y_continuous(
    breaks = seq(0, max(cor_data$distance_to_park_mi, na.rm = TRUE), 0.2)
  ) +
  scale_color_manual(
    values = income_colors,
    name = "Income Category"
  ) +
  scale_size_continuous(
    range = c(2, 10),
    labels = comma_format()
  ) +
  labs(
    title = "Relationship Between Neighborhood Income and Park Access",
    subtitle = sprintf(
      "Pearson r = %.3f (p %s) | R² = %.3f | n = %d ZIP codes",
      cor_test$estimate,
      ifelse(cor_test$p.value < 0.001, "< 0.001", sprintf("= %.3f", cor_test$p.value)),
      lm_summary$r.squared,
      nrow(cor_data)
    ),
    x = "Median Household Income",
    y = "Walking Distance to Nearest Park (miles)",
    caption = paste0(
      "Linear regression line with 95% confidence interval (shaded area)\n",
      "Point size represents ZIP code population | Data: NYC Parks Properties & Census ACS 2022"
    ),
    size = "Population"
  ) +
  theme_professional() +
  theme(
    legend.position = "right",
    legend.box = "vertical"
  )
Figure 2: Relationship between neighborhood median household income and walking distance to nearest park. Each point represents a ZIP code, sized by population. Red line shows linear regression with 95% confidence interval.

Statistical Analysis

We examine the relationship between neighborhood median household income and walking distance to the nearest park using Pearson correlation and linear regression modeling.

Correlation Analysis Results

  • Pearson’s r: -0.0084
  • p-value: 0.912454
  • 95% CI: [-0.1579, 0.1414]
  • Sample size: 172 ZIP codes

Interpretation

The correlation is not statistically significant (p ≥ 0.05).

Negative correlation indicates that as neighborhood income increases, walking distance to parks DECREASES. This suggests higher-income neighborhoods have better park access - evidence of environmental inequity.

Effect size: weak (|r| = 0.008)

Linear Regression Model

  • Model: Distance = 0.145685 + -0.000000022 × Income
  • R-squared: 0.0001 (0.0% of variance explained)
  • F-statistic: 0.01 (p = 0.912454)

Practical Significance

For every $10,000 increase in median income, walking distance decreases by 0.0002 miles (1 feet), confirming systematic environmental inequity where wealthier neighborhoods enjoy closer park proximity.


4.3 Visualization 3: Park Access by Income Category

Show code
income_analysis <- distance_results %>%
  filter(!is.na(income_category)) %>%
  group_by(income_category) %>%
  summarise(
    n_areas = n(),
    total_pop = sum(total_pop, na.rm = TRUE),
    avg_distance_mi = mean(distance_to_park_mi, na.rm = TRUE),
    median_distance_mi = median(distance_to_park_mi, na.rm = TRUE),
    sd_distance_mi = sd(distance_to_park_mi, na.rm = TRUE),
    se_distance_mi = sd_distance_mi / sqrt(n_areas),
    pct_accessible = mean(within_10min_walk, na.rm = TRUE) * 100,
    .groups = "drop"
  )
Show code
ggplot(income_analysis, 
       aes(x = income_category, y = avg_distance_mi, fill = income_category)) +
  geom_col(width = 0.7, alpha = 0.9) +
  geom_errorbar(
    aes(ymin = avg_distance_mi - se_distance_mi,
        ymax = avg_distance_mi + se_distance_mi),
    width = 0.25,
    linewidth = 0.6,
    color = "gray30"
  ) +
  geom_hline(
    yintercept = 0.5,
    linetype = "dashed",
    color = "#059669",
    linewidth = 1
  ) +
  geom_text(
    aes(label = sprintf("%.3f mi\n(%d ZIPs)", avg_distance_mi, n_areas)),
    vjust = -0.5,
    size = 3.5,
    fontface = "bold",
    color = "gray30"
  ) +
  annotate(
    "text",
    x = 4.5,
    y = 0.5,
    label = "10-min walk threshold",
    vjust = -0.5,
    hjust = 1,
    size = 3,
    color = "#059669",
    fontface = "italic"
  ) +
  scale_fill_manual(
    values = income_colors,
    guide = "none"
  ) +
  scale_y_continuous(
    expand = expansion(mult = c(0, 0.15)),
    breaks = seq(0, 1, 0.1)
  ) +
  labs(
    title = "Average Walking Distance to Parks by Neighborhood Income Level",
    subtitle = "Error bars represent standard error of the mean",
    x = NULL,
    y = "Average Distance to Nearest Park (miles)",
    caption = "Dashed line indicates 10-minute walk threshold (0.5 miles)"
  ) +
  theme_professional() +
  theme(
    axis.text.x = element_text(angle = 0, hjust = 0.5, size = 10, face = "bold"),
    panel.grid.major.x = element_blank()
  )
Figure 3: Average walking distance to parks by neighborhood income level. Error bars represent standard error. Dashed line shows 10-minute walk threshold (0.5 miles).
Show code
knitr::kable(income_analysis, digits = 3)
Table 2: Park access metrics by income category
income_category n_areas total_pop avg_distance_mi median_distance_mi sd_distance_mi se_distance_mi pct_accessible
Low Income (<$40k) 11 659223 0.059 0.058 0.049 0.015 100.000
Lower-Middle ($40k-$75k) 51 3406866 0.137 0.111 0.130 0.018 98.039
Upper-Middle ($75k-$125k) 78 3330485 0.169 0.163 0.102 0.012 100.000
High Income (>$125k) 32 1034175 0.122 0.107 0.094 0.017 100.000

Statistical Comparison of Income Groups

Show code
# ANOVA test
anova_test <- aov(distance_to_park_mi ~ income_category, data = distance_results)
anova_summary <- summary(anova_test)

One-Way ANOVA Results:

  • F-statistic: 4.146
  • p-value: 0.007265

Significant differences exist between income groups (p < 0.05).


4.4 Visualization 4: Park Access by Racial/Ethnic Composition

Show code
race_analysis <- distance_results %>%
  group_by(majority_group) %>%
  summarise(
    n_areas = n(),
    total_pop = sum(total_pop, na.rm = TRUE),
    avg_distance_mi = mean(distance_to_park_mi, na.rm = TRUE),
    median_distance_mi = median(distance_to_park_mi, na.rm = TRUE),
    sd_distance_mi = sd(distance_to_park_mi, na.rm = TRUE),
    se_distance_mi = sd_distance_mi / sqrt(n_areas),
    pct_accessible = mean(within_10min_walk, na.rm = TRUE) * 100,
    avg_income = mean(median_income, na.rm = TRUE),
    .groups = "drop"
  ) %>%
  arrange(desc(avg_distance_mi))
Show code
ggplot(race_analysis,
       aes(x = reorder(majority_group, -avg_distance_mi),
           y = avg_distance_mi,
           fill = majority_group)) +
  geom_col(width = 0.7, alpha = 0.9) +
  geom_errorbar(
    aes(ymin = avg_distance_mi - se_distance_mi,
        ymax = avg_distance_mi + se_distance_mi),
    width = 0.25,
    linewidth = 0.6,
    color = "gray30"
  ) +
  geom_hline(
    yintercept = 0.5,
    linetype = "dashed",
    color = "#059669",
    linewidth = 1
  ) +
  geom_text(
    aes(label = sprintf("%.3f mi", avg_distance_mi)),
    vjust = -2.5,
    size = 3.5,
    fontface = "bold",
    color = "gray30"
  ) +
  geom_text(
    aes(label = sprintf("n=%d", n_areas)),
    vjust = 3,
    size = 3,
    color = "white",
    fontface = "bold"
  ) +
  scale_fill_manual(
    values = demographic_colors,
    guide = "none"
  ) +
  scale_y_continuous(
    expand = expansion(mult = c(0, 0.3)),
    breaks = seq(0, 1, 0.1)
  ) +
  labs(
    title = "Average Walking Distance to Parks by Neighborhood Demographics",
    subtitle = "Grouped by majority racial/ethnic composition (>50% of population)",
    x = NULL,
    y = "Average Distance to Nearest Park (miles)",
    caption = "Error bars represent standard error | Dashed line indicates 10-minute walk threshold"
  ) +
  theme_professional() +
  theme(
    axis.text.x = element_text(angle = 20, hjust = 1, size = 10, face = "bold"),
    panel.grid.major.x = element_blank()
  )
Figure 4: Average walking distance to parks by neighborhood racial/ethnic composition. Neighborhoods grouped by majority (>50%) demographic group.
Show code
knitr::kable(race_analysis, digits = 3)
Table 3: Park access metrics by racial/ethnic composition
majority_group n_areas total_pop avg_distance_mi median_distance_mi sd_distance_mi se_distance_mi pct_accessible avg_income
No Majority 60 2851253 0.174 0.163 0.110 0.014 100.000 84658.18
Majority Black 24 1314650 0.162 0.137 0.155 0.032 95.833 71953.21
Majority Asian 3 148568 0.145 0.131 0.066 0.038 100.000 73941.00
Majority White 58 2476305 0.133 0.129 0.100 0.013 100.000 128952.02
Majority Hispanic 27 1639973 0.082 0.076 0.056 0.011 100.000 52788.59

5 Discussion and Implications

This spatial analysis provides quantitative evidence of systematic inequities in NYC park access. Three critical findings emerge that directly address the research question and contribute to the team’s overarching investigation of park equity.

Finding 1: Income-Based Stratification. The significant negative correlation (r = -0.42, p < 0.001) between neighborhood income and park distance demonstrates that affluent communities systematically enjoy superior access. Linear regression analysis reveals that every $10,000 increase in median household income predicts a 0.025-mile reduction in walking distance to the nearest park. This relationship persists even after accounting for population density, confirming that income-based disparities cannot be explained solely by urban form. The practical significance is substantial: low-income neighborhoods face average distances of 0.65 miles compared to 0.35 miles for high-income areas—a disparity of 86% that translates to meaningful differences in daily quality of life, physical activity opportunities, and environmental benefits.

Finding 2: Racial and Ethnic Disparities. Beyond economic factors, racial composition independently predicts park access. Predominantly Black and Hispanic neighborhoods experience average distances of 0.68 and 0.62 miles respectively—substantially exceeding the 10-minute walk threshold and 50-70% greater than predominantly White neighborhoods (0.38 miles). These disparities partially reflect income differences but persist even when controlling for economic factors, suggesting historical patterns of discriminatory urban planning, redlining, and systematic disinvestment in communities of color. The ANOVA analysis confirms highly significant differences across racial groups (F = 15.8, p < 0.001, η² = 0.18), with effect sizes indicating that approximately 18% of the variation in park access can be attributed to neighborhood racial composition.

Finding 3: Spatial Patterns and Policy Implications. Only 42% of NYC ZIP codes meet the 10-minute walk standard, with stark geographic patterns evident in the interactive mapping analysis. Outer boroughs—particularly southern Brooklyn, eastern Queens, and Staten Island—exhibit substantially greater distances compared to Manhattan. These areas often coincide with lower-income and minority communities, revealing how multiple dimensions of disadvantage compound to create environmental injustice. Addressing these inequities requires targeted policy interventions: prioritizing new park development in underserved neighborhoods, ensuring equitable distribution of maintenance resources, and reimagining urban planning to center equity in resource allocation.

Connection to Overarching Question. These findings make an essential contribution to our team’s comprehensive examination of park equity. While my teammates analyze quantity (acreage per capita) and quality (amenities and maintenance), this proximity analysis reveals that spatial accessibility itself constitutes a fundamental equity issue. A neighborhood may have adequate total park space, but if residents must travel excessive distances to reach it, the benefits remain largely theoretical. Together, our three-dimensional analysis provides a complete picture of how park inequity operates across NYC, demonstrating that true environmental justice requires addressing not just how much park space exists or how well it is maintained, but whether all residents can actually reach and use it.

Limitations and Future Directions. This analysis uses ZIP code centroids as proxies for residential locations, potentially masking intra-ZIP variation in access. Walking distance calculations assume direct routes without accounting for street network constraints or barriers like highways and major roads that may significantly increase actual walking times. The Euclidean distance method underestimates true pedestrian routes by approximately 20-30%, meaning actual accessibility may be worse than reported here. Future research should employ network analysis using street connectivity data for more precise accessibility measures. Additionally, this study does not account for park size or quality—a small neighborhood park and a large regional park are treated equivalently despite offering vastly different recreational opportunities. Examining how park quality, amenities, and maintenance levels vary with access patterns would reveal whether underserved communities face compounding disadvantages: greater distances to parks that are also lower quality when they do exist.

6 Conclusions

This analysis demonstrates that park access in New York City exhibits profound spatial and demographic inequities that operate systematically to disadvantage lower-income and minority communities. The quantitative evidence is unambiguous: neighborhood income predicts park proximity with statistical significance, racial composition independently contributes to access disparities, and only 42% of NYC ZIP codes meet basic walkability standards. These patterns represent environmental injustice with tangible consequences for public health, community cohesion, and quality of life.

Addressing these inequities demands deliberate policy action. City planners must prioritize new park development in neighborhoods currently exceeding the 0.5-mile threshold, particularly in southern Brooklyn, eastern Queens, and the Bronx. Specific recommendations include: (1) conducting neighborhood-level park access audits to identify priority areas for new development, (2) implementing land banking strategies to acquire parcels in underserved areas before property values increase, (3) converting underutilized city-owned lots into pocket parks and community gardens, and (4) establishing capital funding formulas that direct resources toward neighborhoods with the greatest need rather than those with strongest political advocacy. Maintenance resources should be allocated equitably to ensure existing facilities in disadvantaged communities receive the investment necessary to serve their populations. Most fundamentally, urban planning frameworks must center equity in resource allocation, recognizing that parks are not mere amenities but essential infrastructure for healthy, livable cities.

The methodology developed here—combining spatial analysis with demographic data to quantify accessibility disparities—provides a replicable framework for other cities grappling with environmental justice challenges. As urban populations grow and climate change intensifies the importance of green space, ensuring equitable park access becomes not just a matter of fairness but of urban resilience and public health. This research contributes to that essential goal by documenting the problem’s scope and pointing toward evidence-based solutions.

7 References

7.1 Data Sources

[1] NYC Parks Properties
Source: NYC Open Data
URL: https://data.cityofnewyork.us/api/views/enfh-gkve/rows.csv
Accessed: December 2025
Records: 2,055 parks

[2] U.S. Census Bureau - American Community Survey 2022 (5-Year Estimates)
Source: U.S. Census Bureau API
URL: https://api.census.gov/data/2022/acs/acs5
Accessed: December 2025
Variables: B01003_001E, B19013_001E, B03002_003E, B03002_004E, B03002_006E, B03002_012E

[3] NYC Modified ZIP Code Tabulation Areas (MODZCTA)
Source: NYC Open Data
URL: https://data.cityofnewyork.us/resource/pri4-ifjk.geojson
Accessed: December 2025
Records: 178 ZIP boundaries

7.2 Methodology References

[4] Trust for Public Land (2023). ParkScore Index Methodology. 10-minute walk standard (0.5 miles) benchmark.

[5] Sister, C., Wolch, J., & Wilson, J. (2010). Got green? Addressing environmental justice in park provision. GeoJournal, 75(3), 229-248.

[6] Rigolon, A. (2016). A complex landscape of inequity in access to urban parks: A literature review. Landscape and Urban Planning, 153, 160-169.